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We propose an optimizing transformation which reduces program runtime 
at the expense of program size by eliminating conditional jumps. 



1 Preface 
1.1 Motivation 

In a variety of cases, code is written in a way that in one execution, a conditional 
execution is evaluated several time. Situations where this may be happening include the 
following: 

• Repeated use of the ternary operator (•?• : •) with a common conditional expression. 

• An if-then-else statement inside a loop, where the condition is loop invariant. 

• Use of macros or inlined functions provided by a library that include conditional 
expression. 

• Conditional jumps implicitly inserted by the compiler due to short-circuit logic. 

• Naive code mechanically generated from another source via tools such as parser 
generators, or compilers of higher languages that compile to C and then invoke a 
C compiler. 

• Conditionals introduced by earlier compila tion p asses, such as the Partial Dead 



Code pass conceived by Bodi'k and Gupta [BG97l | are likely to make other condi- 
tionals redundant. In fact, the PDE paper recommends a "branch elimination" 
step without giving the details of this. CECD can serve as an implementation of 
this step. 
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In some of these cases the programmer might be able to eliminate the redundant con- 
ditional expression by himself, but often at the cost of less readable code or repetition, 
such as two instances of the loop mentioned in the second bullet. In other cases, such as 
the library-provided macros or the generated code, it is not feasible to expect the source 
code to be free of redundant conditionals. Therefore it is desirable that an optimizing 
compiler can perform this transformation. 

Furthermore, this transformation not only reduces execution time but can enabled fur- 
ther optimizations: If the conditional expression is of the form v == c for a variable v 
and a constant c, a constant propagation pass can replace v by c in the then-branch, 
which has been enlarged by our optimization. Also, modern computer architectures, due 
to long pipelines, perform better if fewer conditional jumps occur in the code. 

1.2 Outline 

In the next section, we explain when a given region to duplicate is valid and how to 
perform the conditional elimination. Aiming for a very clear, simple and homogeneous 
presentation, we describe the algorithm in a very general setting. This will possibly 
introduce dead code. An implementation would either run a dead code elimination pass 
afterwards or refine the given algorithm as required. The transformation is demonstrated 
by example. 

Section [3] discusses which properties the region should satisfy for the optimization to 
actually have a positive effect, and how to avoid useless code duplication. 

To decide whether to perform the optimization, we give a simple heuristic that selects 
a region to be duplicated and decides whether the optimization should be performed, 
weighting the (runtime) benefits weighted against the (code size) cost in subsection 13.21 
We also show that a slight more sophisticated approach, which takes profiling information 
into account, becomes ATP-hard. 

Data flow equations for the properties discussed in the preceding two sections are given 
inH 
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2 Conditional elimination 



Let e be an expression, which should occur as the condition for a conditional branch in 
the control flow graph (CFG) of a program, and let v%, v%, ■ ■ ■ be the operands of the 
expression. 

Let I? be a region of the control flow graph, i.e. D C BB where BB is the set of basic 
blocks in the control flow graph. 

The region D is valid if and only if no basic block body in D contains an assignment to 
any of the operands v%, V2, ... of e. 

The parameters of the optimization are the conditional expression e and any valid set 
D. The transformation is performed in three steps, where the first step is generic code 
duplication which does not yet consider the conditional expression, the second step 
rewires some edges to make the other copies reachable and the last step removes the 
redundant conditionals. Each step preserves the meaning of the program. 

1. (Code duplication) For every basic block bbi € D, create three copies^: the true 
copy bb\, the false copy bb{ and the unknown copy bbf. The edges of the graph are 
modified as follows: 

• An edge between bbi £ D and bbj ^ D is left unchanged. 

• An edge between bbi £ D and bbj 6 D is reproduced by the three edges bb\ 
to bbj, bb{ to bb f j and bbf to bb u y 

• An edge between bbi £ D and bbj £ D is reproduced by the three edges bbj 
to bbj, bbf to bbj to bbf to bbj. 

• An edge between bbi £ D and bbj € D is changed to an edge from bbi to bbj. 

2. (Conditional evaluation) For every conditional edge from bbi to bbj depending on 
e being true (false) at the end of bbi, where bbj is a copy of a node in D, replace it 
by an edge bbi to bbj {bbj). 

3. (Conditional elimination) For every basic block bbi £ D which has a conditional 
branch depending on e being true (false), remove the condition in bbj (bb{), un- 
conditionally follow the true (false) case and remove the other edge. 

This algorithm is correct and safe. For correctness, consider an execution path. If the 
path does not pass any node in D, it is not altered by the above algorithm. If the path 
passes through D, but only through unknown copies, it is also not altered. If the path 
eventually reaches a true (false) copy of a node, it must be because of an edge altered in 
step 2. At that point of execution, the value of e is known to be true (false), and because 
D is valid, it remains so until the execution path leaves the region D. Any conditional 
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Figure 2: Example control flow graph before CECD 



jump skipped because of step 3 is therefore behaving exactly as in the original execution 
path. . 

Safeness follows from the fact that we only copy nodes and remove the evaluation of 
conditionals, so along no path new instructions are added. 



Example 

Consider the code fragment in Figure [1] (leaving out 

any unrelated assignments or expressions). The jf then 

corresponding control flow graph is given in fig- | jf e then . . . else 

ure [2j The largest valid region is marked, as well else 

as the largest region if useful nodes. Applying I &■=■■■ 

the algorithm with D set to the region of useful end 

nodes, after step 1 we obtain the graph shown in while . . . do 

I if e then . . . else . . . 



figure 3 on the following page At this point, the 
true and false copies are not reachable yet. Steps end 
2 and 3 modify the edges related to conditional on 



e, and we reach figure 4 on the next page This 



contains a lot of dead code. Removing this in a 
standard dead code removal pass, we reach the fi- 



Figure 1: Example code 



nal state 5 on page 6 It can clearly be seen that on every path from entry to exit, the 
conditional e is evaluated at most once. Also the issue of a while-loop occurrence (in 
contrast to the optimizer-friendly do-while-loop) is gracefully taken care of. 
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Figure 3: Example control flow graph after code duplication of useful nodes 




Figure 4: Example control flow graph after conditional evaluation and elimination 



5 




Figure 5: Example control flow graph after conditional evaluation and elimination and 
dead code elimination 
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3 The region of duplication 



The above algorithm works for any valid region, and validity is a simple local property 
that is easily checked. But not all valid regions are useful. For example, entry nodes bbi 
of the region where no incoming edge depends on e would be duplicated, but only bbf 
would be reachable. Similarly, exit nodes of the region that do not have a conditional 
evaluation of e would be copied for no gain. 

3.1 Usefulness 

Therefore, we can define that a node bbi in a valid region D to be useless if 

• on all paths leading to bbi, there is no conditional evaluation of e followed only by 
nodes in D or 

• no path originating from bbi reaches an conditional evaluation of e before it leaves 
the region D. 

A node bbi G D that is not useless is useful. 

Uselessness is, in contrast to validity, not a property of the basic block alone but defined 
with respect to the chosen region D. A basic block may be useless in D but not so in a 
different region D' . But the property is monotonous: If D' C D and D is useful in D', 
then it is also useful in D. 

3.2 Evaluation of a region 

For a given conditional expression, there are many possible regions of duplication, and 
even if we only consider fully useful regions, their number might be exponential in the size 
of the graph. Therefore we need an heuristic that selects a sensible region or decides that 
no region is good enough to perform CECD. We split this decision into two independent 
steps: Region Selection, where the the best region for a particular conditional, for some 
meaning of "best" is chosen, and Region Evaluation, where it is decided whether CECD 
should be performed for the selected region. 

These decisions have to depend on the intended use of the code. Code for an embedded 
system might have very tight size requirements and large regions of duplication would 
be unsuitable, whereas code written for massive numerical calculations may be allowed 
to grow quite a bit if it removes instructions from the inner loops. 

At this point, we suggest a very simple heuristic for Region Selection: To cover as 
many executions paths as possible, we just pick the largest valid region consisting of 
useful nodes. The heuristic for Region Evaluation expects one parameter k, which is the 
number of additional expressions that the program is allowed to grow for one conditional 
to be removed. Together, this amounts to the following steps being taken: 
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1. Let D be the largest valid region consisting only of useful nodes. 

2. Let B*,R f resp. R u the set of those basic blocks in D, whose true, false respunknown 
copy will be reachable after CECD. 

3. Let n be the number of basic blocks in D that contain a conditional evaluation of 
e, i.e. the number of redundant conditionals. 

4. If 

Y, s(bb t ) + y s ( bb i) + E s ^ - E s( - bb ^ ^ n ■ k > 

bbiGR* bbi&Rf bheR u bbieD 

where is a user-defined parameter and S(bbi) is the number of instructions in 
the basic block bbi, perform CECD on D, otherwise do not perform CECD for this 
conditional expression. 

A number of improvements to this scheme come to mind: 

• The selection heuristic should consider subsets of the largest valid and useful re- 
gions as well. 

• It should give different weights to conditionals that are completely removed and 
conditionals that are only partially removed. 

• Removal of conditionals in inner loops should allow for a larger increase of code 
size. 

• Given sufficiently detailed execution traces, a more exact heuristic can be imple- 
mented. In the next section we see that this easily leads to a AA'P-hard problem. 

3.3 MV- hardness of a profiling based Region Selection heuristic 

A straight forward extension of the above Region Selection heuristic that takes profiling 
data in the form of execution traces into account, would maximize the sum Ylbb-eE fi bb i)i 
where E is the set of of basic blocks containing an eliminated conditional and f(bbi) is the 
number of paths in the execution traces where the conditional in bbi would be eliminated 
due to CECD. For simplicity, we assume that an occurrence of a conditional expression 
does not contribute to the size S(bbi) of a basic block. 

If we have an algorithm that selects the optimal region, we can solve the 0-1 knapsack 
problem, which is A/'P-complete. The specification of this problem is as follows: 

Given n items with weight Wi € N and value v\ G N, i = 1, . . . ,n and a 
bound W G N, find a selection of items X C {1, . . . , n} that maximizes the 
sum Yliex v i un der the constraint J2iex w i — W ■ 

Given such a problem, we construct a control flow graph and profiling data as follows: 
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• The entry node is bb s , which contains a conditional expression e. Both conditional 
branches point to the node bb r . 

• There is one exit node bb e with a conditional expression e. 

• The node bb r is the root of a binary tree of basic blocks. The inner nodes contain no 
instructions but conditional jumps with conditional expressions that are pairwise 
distinct and distinct from e. 

• The tree contains n leaf nodes bb\, i = l,...,n. The node bb\ contains Wi in- 
structions, i.e. S(bb\) = Wi and the profiling data gives a frequency of v.- L for the 
execution path passing through bb\. 

• The parameter k is chosen to be W. 

A valid and useful region of duplication D in this CFG corresponds to a subset of 
X £ 1, ..,n and, if non-empty, includes bb e , bb\ for i £ I and the nodes connecting bb r 
with those leaf nodes. Because bb s dominates all nodes in D, no unknown copies will be 
generated, and both true and false copies are reachable. The inner nodes of the binary 
tree and bb e only contain conditional expressions and thus do not contribute to the size 
of the duplicated region. Only one redundant conditional occurs, hence n = 1. The 
number of executions of bb e where the conditional is eliminated is exactly the number of 
execution paths that pass through one of the leaf nodes in D. Therefore, the constraint 
imposed by the Region Evaluation heuristic becomes 

Y s(bb t ) + y s ( bb i) + Y s ^ - Yl s ^ ^ n - k ^ 

bbteRt bbiERf bbi€R u bbieD 

Y s{bb\) + y s{bb\) +o-Y s ( bb i) ^ 1 • k 

Y wi - W 

iex 

and the term to be optimized can be transformed as follows: 

Y f(bb i ) = Yfm = Y v - 

bbieE ieX i&X 



This concludes the proof of W'P-hardness of this profiling-based heuristic for CECD. 

The assumption that conditional expressions do not contribute to the size of a node is 
not critical: If they do contribute, then this result can still be obtained by a technical 
modification: Increase k by one and then scale k and the number of instructions in the 
nodes bb\ by a factor larger than the number of all conditional expressions occurring. 
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4 Data Flow equations 



Three properties of basic blocks have been defined so far: Validness, usefulness and, 
for the heuristics, which copies of the block will be present after dead code removal. 
The first one is a purely local property, while the others can be obtained by standard 
data flow analyses. The defining equations are given in this section, succ(z) is the set 
of successor nodes of bbi in the control flow graph, pred(i) the set of predecessors. We 
assume that nodes with a conditional jump have exactly two successors, one for true and 
one for false. 

Local properties: 

• Validj: Basic block bbi does not contain an assignment to an operator of e. 

• TrueEdge^-: An edge bbi — > bbj exists and depends on e being true. 

• FalseEdge^- : An edge bbi bbj exists and depends on e being false. 

• Exprj = ^j esucc (j) TrueEdge^ + FalseEdge^-: e is a conditional expression in bbi 
Determining the largest valid region D of useful nodes: 

• Live; = Valid; • E iepr cd(i) Ex P r i + Live i 

• Antic; = Valid; • (Expr; + £ jgsucc( ;) Antic,) 

• Di = Live, • Antic; 

Given a valid region D (which may or may not be obtained using our suggested simple 
heuristic), determining which copies of the nodes therein are reachable: 



All given data flow equations are any-path equations and therefore, the values can be 
initialized to false before solving the equations using a standard iterative round-robin or 
worklist approach. 




A 



A 



E jepr ed(i) -Expr r (^ + i?«) 
Ejeprcd(i) Rt j + TrueEdge^ 
E jep rcd(i) R j + FalseEdge^ 
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5 Future work and conclusions 



While the "how" of CECD is fully understood, the question of "where" and "when", 
i.e. coming up with good heuristics for the selection of the conditional and region of 
duplication, needs much further investigation. Also, experiments with real code have 
yet to be conducted to quantify the benefit and suggest good values for the heuristics' 
parameters. Another possible improvement would be to not only consider syntactically 
equal conditions, but also take algebraic identities into account. 

The simplicity of the CECD transformation and the fact that it can easily handle complex 
control flow indicate that it could be an optimization of general interest. 
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